On Probabilistic Inference Approaches to Stochastic Optimal Control
نویسنده
چکیده
While stochastic optimal control, together with associate formulations like Reinforcement Learning, provides a formal approach to, amongst other, motor control, it remains computationally challenging for most practical problems. This thesis is concerned with the study of relations between stochastic optimal control and probabilistic inference. Such dualities – exemplified by the classical Kalman Duality between the Linear-Quadratic-Gaussian control problem and the filtering problem in Linear-Gaussian dynamical systems – make it possible to exploit advances made within the separate fields. In this context, the emphasis in this work lies with utilisation of approximate inference methods for the control problem. Rather then concentrating on special cases which yield analytical inference problems, we propose a novel interpretation of stochastic optimal control in the general case in terms of minimisation of certain Kullback-Leibler divergences. Although these minimisations remain analytically intractable, we show that natural relaxations of the exact dual lead to new practical approaches. We introduce two particular general iterative methods Ψ-Learning, which has global convergence guarantees and provides a unifying perspective on several previously proposed algorithms, and Posterior Policy Iteration, which allows direct application of inference methods. From these, practical algorithms for Reinforcement Learning, based on a Monte Carlo approximation to Ψ-Learning, and model based stochastic optimal control, using a variational approximation of posterior policy iteration, are derived. In order to overcome the inherent limitations of parametric variational approximations, we furthermore introduce a new approach for none parametric approximate stochastic optimal control based on a reproducing kernel Hilbert space embedding of the control problem. Finally, we address the general problem of temporal optimisation, i.e., joint optimisation of controls and temporal aspects, e.g., duration, of the task. Specifically, we introduce a formulation of temporal optimisation based on a generalised form of the finite horizon problem. Importantly, we show that the generalised problem has a dual finite horizon problem of the standard form, thus bringing temporal optimisation within the reach of most commonly used algorithms.
منابع مشابه
Application of Stochastic Optimal Control, Game Theory and Information Fusion for Cyber Defense Modelling
The present paper addresses an effective cyber defense model by applying information fusion based game theoretical approaches. In the present paper, we are trying to improve previous models by applying stochastic optimal control and robust optimization techniques. Jump processes are applied to model different and complex situations in cyber games. Applying jump processes we propose some m...
متن کاملProbabilistic Multi Objective Optimal Reactive Power Dispatch Considering Load Uncertainties Using Monte Carlo Simulations
Optimal Reactive Power Dispatch (ORPD) is a multi-variable problem with nonlinear constraints and continuous/discrete decision variables. Due to the stochastic behavior of loads, the ORPD requires a probabilistic mathematical model. In this paper, Monte Carlo Simulation (MCS) is used for modeling of load uncertainties in the ORPD problem. The problem is formulated as a nonlinear constrained mul...
متن کاملOn Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract)
We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also demonstrating that the formalism leads to novel practical approaches to the control problem. Specifically, a natural relaxation of the dual formulation gives rise to exact iterative solutions to the f...
متن کاملProbabilistic inference as a model of planned behavior
The problem of planning and goal-directed behavior has been addressed in computer science for many years, typically based on classical concepts like Bellman’s optimality principle, dynamic programming, or Reinforcement Learning methods – but is this the only way to address the problem? Recently there is growing interest in using probabilistic inference methods for decision making and planning. ...
متن کاملA study of Morphological Computation by using Probabilistic Inference for Motor Planning
One key idea behind morphological computation is that many difficulties of a control problem can be absorbed by the morphology of the robot. The performance of the controlled system naturally depends on the control architecture and on the morphology of the robot. Ideally, adapting the morphology of the plant and optimizing the control law interact such that finally, optimal physical properties ...
متن کامل